在诸如DARTS等可分解神经结构搜索(NAS)算法中,用于更新模型权重的训练集和用于更新模型架构的验证集是从相同的数据分发采样的。因此,数据集中的罕见功能在训练期间无法获得足够的注意。在本文中,而不是引入更复杂的NAS算法,我们探讨了将质量合成数据集添加到培训中的想法可以帮助分类模型识别其弱点并提高识别准确性。我们介绍了一个名为“可怜的架构搜索的培训策略,使用生成模型(DASGM)”。“在DASGM中,训练集用于更新分类模型权重,而合成的数据集用于训练其架构。生成的图像具有来自培训集的不同分布,可以帮助分类模型了解更好的特征来识别其弱点。我们将达斯哥姆分配到多级优化框架中,并开发一个有效的算法来解决它。CiFar-10,CiFar-100的实验,Cifar-100,并且想象成展示了DASGM的有效性。将提供代码。
translated by 谷歌翻译
长期以来,Robotics一直是一个遍布复杂系统体系结构的领域,无论传统或基于学习的模块和联系都需要大量的人类专业知识和先验知识。受大型预训练语言模型的启发,这项工作引入了预先培训的通用表示范式,该范式可以作为给定机器人多个任务的起点。我们提出了感知性因果变压器(PACT),这是一种基于生成变压器的架构,旨在以自我监督的方式直接从机器人数据构建表示形式。通过对状态和行动的自动回归预测,我们的模型隐含地编码了特定机器人的动态和行为。我们的实验评估重点是移动药物的域,我们表明该机器人特定的表示可以作为单个起点,以实现不同的任务,例如安全导航,定位和映射。我们评估了两个形式:使用激光雷达传感器作为感知输入(MUSHR)的轮式机器人,以及使用第一人称RGB图像(栖息地)的模拟药物。我们表明,与训练单个模型的同时训练单个模型相比,对所有任务的单个模型进行训练,并且与独立培训单独的大型模型相当的性能,对每个任务的单个模型进行了可比的训练,则在较大的审计模型上进行了固定小型任务特异性网络,从而使性能明显提高。通过跨任务共享共同的优质表示,我们可以降低整体模型容量并加快此类系统的实时部署。
translated by 谷歌翻译
模拟逼真的传感器是自主系统数据生成的挑战,通常涉及精心手工的传感器设计,场景属性和物理建模。为了减轻这一点,我们引入了一条管道,用于对逼真的激光雷达传感器进行数据驱动的模拟。我们提出了一个模型,该模型可以在RGB图像和相应的LIDAR功能(例如Raydrop或每点强度)之间直接从真实数据集中进行映射。我们表明,我们的模型可以学会编码逼真的效果,例如透明表面上的掉落点或反射材料上的高强度回报。当应用于现成的模拟器软件提供的天真播放点云时,我们的模型通过根据场景的外观预测强度和删除点来增强数据,以匹配真实的激光雷达传感器。我们使用我们的技术来学习两个不同的LIDAR传感器的模型,并使用它们相应地改善模拟的LiDAR数据。通过车辆细分的示例任务,我们表明通过我们的技术增强模拟点云可以改善下游任务性能。
translated by 谷歌翻译
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
translated by 谷歌翻译
3D object detection is vital as it would enable us to capture objects' sizes, orientation, and position in the world. As a result, we would be able to use this 3D detection in real-world applications such as Augmented Reality (AR), self-driving cars, and robotics which perceive the world the same way we do as humans. Monocular 3D Object Detection is the task to draw 3D bounding box around objects in a single 2D RGB image. It is localization task but without any extra information like depth or other sensors or multiple images. Monocular 3D object detection is an important yet challenging task. Beyond the significant progress in image-based 2D object detection, 3D understanding of real-world objects is an open challenge that has not been explored extensively thus far. In addition to the most closely related studies.
translated by 谷歌翻译
Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases (i.e., 3D segmentations defined by a group of NeRF-type functions) that are shared across the scene. While our palette-based bases are view-independent, we also predict a view-dependent function to capture the color residual (e.g., specular shading). During training, we jointly optimize the basis functions and the color palettes, and we also introduce novel regularizers to encourage the spatial coherence of the decomposition. Our method allows users to efficiently edit the appearance of the 3D scene by modifying the color palettes. We also extend our framework with compressed semantic features for semantic-aware appearance editing. We demonstrate that our technique is superior to baseline methods both quantitatively and qualitatively for appearance editing of complex real-world scenes.
translated by 谷歌翻译
Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.
translated by 谷歌翻译
Recent work has shown that large language models are capable of generating natural language reasoning steps or Chains-of-Thoughts (CoT) to answer a multi-step question when prompted to do so. This is insufficient, however, when the necessary knowledge is not available or up-to-date within a model's parameters. A straightforward approach to address this is to retrieve text from an external knowledge source using the question as a query and prepend it as context to the model's input. This, however, is also insufficient for multi-step QA where \textit{what to retrieve} depends on \textit{what has already been derived}. To address this issue we propose IRCoT, a new approach that interleaves retrieval with CoT for multi-step QA, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Our experiments with GPT3 show substantial improvements in retrieval (up to 22 points) and downstream QA (up to 16 points) over the baselines on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. Notably, our method also works well for much smaller models such as T5-Flan-large (0.7B) without any additional training.
translated by 谷歌翻译
The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.
translated by 谷歌翻译
Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In this paper, we show this factorization can be combined with regression on a continuous response variable. In practice, the method performs better than regression done after topics are identified and retrains interpretability.
translated by 谷歌翻译